SA-SSR: a suffix array-based algorithm for exhaustive and efficient SSR discovery in large genetic sequences
نویسندگان
چکیده
UNLABELLED Simple Sequence Repeats (SSRs) are used to address a variety of research questions in a variety of fields (e.g. population genetics, phylogenetics, forensics, etc.), due to their high mutability within and between species. Here, we present an innovative algorithm, SA-SSR, based on suffix and longest common prefix arrays for efficiently detecting SSRs in large sets of sequences. Existing SSR detection applications are hampered by one or more limitations (i.e. speed, accuracy, ease-of-use, etc.). Our algorithm addresses these challenges while being the most comprehensive and correct SSR detection software available. SA-SSR is 100% accurate and detected >1000 more SSRs than the second best algorithm, while offering greater control to the user than any existing software. AVAILABILITY AND IMPLEMENTATION SA-SSR is freely available at http://github.com/ridgelab/SA-SSR CONTACT: [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
Genetic diversity analysis of recombinant inbred lines of rice (Oryza sativa L.) using microsatellite markers
Estimation of genetic diversity is an important factor in germplasm conservation and characterization. In rice breeding programs, genetic diversity information on specific regions of genome can be very useful for the application of marker assisted selection (MAS) and for gene mapping. A total of 152 rice lines were considered for breeding programs using microsatellites (SSR) technique. The tota...
متن کاملMining for SNPs and SSRs using SNPServer, dbSNP and SSR taxonomy tree.
Molecular genetic markers represent one of the most powerful tools for the analysis of genomes and the association of heritable traits with underlying genetic variation. The development of high-throughput methods for the detection of single nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs) has led to a revolution in their use as molecular markers. The availability of large sequ...
متن کاملSSRPrimer and SSR Taxonomy Tree: Biome SSR discovery
Simple sequence repeat (SSR) molecular genetic markers have become important tools for a broad range of applications such as genome mapping and genetic diversity studies. SSRs are readily identified within DNA sequence data and PCR primers can be designed for their amplification. These PCR primers frequently cross amplify within related species. We report a web-based tool, SSR Primer, that inte...
متن کاملTranscriptomic analysis, genic SSR development, and genetic diversity of proso millet (Panicum miliaceum; Poaceae)1
PREMISE OF THE STUDY Proso millet (Panicum miliaceum; Poaceae) is a minor crop with good nutritional qualities and strong tolerance to drought stress and soil infertility. However, studies on genetic diversity have been limited due to a lack of efficient genetic markers. METHODS Illumina sequencing technology was used to generate short read sequences of proso millet, and de novo transcriptome...
متن کاملIdentify SSR Regulators for Functional Gene Sets through Cross-Species Comparison
Single sequence repeats (SSRs) are DNA sequences composed of tandem repetitions of relatively short motifs. They are not only considered as genetic markers but also play an important role in gene regulatory networks, One of the greatest challenges of functional genomics. In order to identify key SSR regulators among functional gene sets, we have developed an efficient algorithm for SSR pattern ...
متن کامل